Skip to content

[core] Support snapshot-based sequence ordering for primary-key tables#7832

Open
JunRuiLee wants to merge 8 commits into
apache:masterfrom
JunRuiLee:snapshot-ordering-v2
Open

[core] Support snapshot-based sequence ordering for primary-key tables#7832
JunRuiLee wants to merge 8 commits into
apache:masterfrom
JunRuiLee:snapshot-ordering-v2

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Purpose

close #7806

Tests

  • SchemaValidationTest#testSnapshotSequenceOrderingHappyPath
  • SchemaValidationTest#testSnapshotSequenceOrderingRejectsSequenceField
  • SchemaValidationTest#testSnapshotSequenceOrderingRejectsNonPkTable
  • KeyValueWithLevelNoReusingSerializerSnapshotIdTest#testRoundTripWithSnapshotId
  • KeyValueWithLevelNoReusingSerializerSnapshotIdTest#testRoundTripWithoutSnapshotId
  • SortMergeSnapshotOrderingTest#testLaterSnapshotWinsOverHigherSequence
  • SortMergeSnapshotOrderingTest#testFallsBackToSequenceWhenSnapshotMissing
  • SortMergeSnapshotOrderingTest#testSameSnapshotFallsBackToSequence
  • SortMergeSnapshotOrderingTest#testStampedAlwaysBeatsUnstamped
  • PrimaryKeySimpleTableTest#testSnapshotSequenceOrdering
  • PrimaryKeySimpleTableTest#testSnapshotSequenceOrderingFallsBackToSequenceWithinSnapshot
  • PrimaryKeySimpleTableTest#testSnapshotSequenceOrderingCompactionPreservesInputSnapshotId
  • PrimaryKeySimpleTableTest#testSnapshotSequenceOrderingWithChangelogInput
  • PrimaryKeySimpleTableTest#testSnapshotSequenceOrderingWithChangelogLookup
  • PrimaryKeySimpleTableTest#testSnapshotSequenceOrderingDeleteFromLaterSnapshot

@JunRuiLee JunRuiLee force-pushed the snapshot-ordering-v2 branch from 36b0eaf to 2c737da Compare May 13, 2026 02:35
@JunRuiLee
Copy link
Copy Markdown
Contributor Author

Hi @JingsongLi, could you help take a look? Many thanks.

Comment thread paimon-core/src/main/java/org/apache/paimon/io/KeyValueDataFileRecordReader.java Outdated
Comment thread paimon-core/src/main/java/org/apache/paimon/operation/FileStoreCommitImpl.java Outdated
Comment thread paimon-core/src/main/java/org/apache/paimon/operation/FileStoreCommitImpl.java Outdated
@JunRuiLee
Copy link
Copy Markdown
Contributor Author

Thanks @leaves12138 for the review! Fixed the compaction ordering issue by persisting per-record snapshotId through _SEQUENCE_NUMBER column. Added tests for the scenario you described. Old constructor removed.

PTAL, Thanks!

@JunRuiLee JunRuiLee force-pushed the snapshot-ordering-v2 branch from 09ba5c9 to 37cc344 Compare May 14, 2026 07:35
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I took another careful pass over the snapshot-ordering implementation. I think there are still a few correctness issues to address before this can be safely merged.

Comment thread paimon-api/src/main/java/org/apache/paimon/CoreOptions.java
@JunRuiLee
Copy link
Copy Markdown
Contributor Author

Thanks @leaves12138 for the careful review.

I fixed the first two correctness issues:

  1. PartialUpdateMergeFunction and AggregateMergeFunction now preserve the winning input record’s snapshotId when returning a newly built KeyValue, so compaction no longer stamps UNKNOWN_SNAPSHOT_ID into
    _SEQUENCE_NUMBER.
  2. KeyValueBuffer now preserves snapshotId when snapshot ordering is enabled, so lookup compaction buffer spill does not lose it during binary serialization/deserialization.

I also added regression coverage for merge-function snapshotId preservation, partial-update compaction, aggregate compaction, and lookup buffer spill.

For the ALTER TABLE concern: this option is annotated as immutable, so enabling it via ALTER on a table with existing snapshots is rejected; empty-table ALTER remains allowed.

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One remaining edge case around partial-update snapshot ordering.

@JunRuiLee JunRuiLee force-pushed the snapshot-ordering-v2 branch from 973e270 to 1b503a7 Compare May 18, 2026 11:46
@leaves12138
Copy link
Copy Markdown
Contributor

leaves12138 commented May 19, 2026

This feature is more complex than it is expected. I can't figure out more corner case, but there must exist some cases we have not catch. This looks good to me, but it need more experts to review this feature. @JingsongLi

JunRuiLee added 8 commits May 19, 2026 12:54
Snapshot ordering relies on snapshot id to determine record order,
but inline compaction happens before snapshot creation — files have
not been stamped with the correct snapshot id yet. Compaction would
propagate incorrect values to KV records.

Enforce write-only=true at schema validation time so that writers
use NoopCompactManager. Dedicated compact jobs override write-only
at runtime via table.copy(), which passes dynamicOptionKeys to skip
this specific check.
@JunRuiLee JunRuiLee force-pushed the snapshot-ordering-v2 branch from 18c93e8 to 9e16681 Compare May 19, 2026 04:59
@JunRuiLee
Copy link
Copy Markdown
Contributor Author

@leaves12138 Thanks for the review! I've reorganized the commits to make the progression clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support snapshot-based sequence ordering for primary-key tables

2 participants